library(bayesrules) 
library(tidyverse) 
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.5     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.4     ✓ stringr 1.4.0
## ✓ readr   2.0.2     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(rstan) 
## Loading required package: StanHeaders
## rstan (Version 2.21.2, GitRev: 2e1f913d3ca3)
## For execution on a local, multicore CPU with excess RAM we recommend calling
## options(mc.cores = parallel::detectCores()).
## To avoid recompilation of unchanged Stan programs, we recommend calling
## rstan_options(auto_write = TRUE)
## 
## Attaching package: 'rstan'
## The following object is masked from 'package:tidyr':
## 
##     extract
library(bayesplot) 
## This is bayesplot version 1.8.1
## - Online documentation and vignettes at mc-stan.org/bayesplot
## - bayesplot theme set to bayesplot::theme_default()
##    * Does _not_ affect other ggplot2 plots
##    * See ?bayesplot_theme_set for details on theme setting
library(broom.mixed) 
library(janitor)
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

Exercise 8.1

Posterior estimation, posterior hypothesis testing, and posterior prediction.

Exercise 8.2

  1. By only reporting the central tendency of the λ posterior model, namely the posterior mean and mode of π, we can only infer that π is most likely to be around a certain value x, yet we neglect the uncertainty in π that it can varies in a plausible range of values (the posterior credible interval for π) which captures the more plausible values of π while eliminating the more extreme and unlikely scenarios. Failing to reflect both the central tendency and variability in π makes us unable to see the full picture.

  2. There’s a 95% posterior probability that π might be somewhere between roughly 1 and 3.4.

Exercise 8.3

  1. It could be addressed using a hypothesis test. Let π denote the proportion of dogs at the dog park that do not have a dog license. We can frame our analysis as two competing hypotheses: the null hypothesis H0 contends that dogs at the dog park that do not have a dog license make up at most 40% of the total dogs at the dog park whereas the alternative hypothesis Ha (Trichelle’s claim) contends that this figure is above 40%. In mathematical notation:

H0: π ≤ 0.4

Ha: π > 0.4

We can then do the one-sided hypothesis test by calculating the prior and posterior probabilities of the null and alternative hypotheses before and after observing data. Next, we could divide prior or posterior probability of Ha by that of H0 to come up with the odds that indicate how much more/less likely for π to be above 0.4 than to be or below 0.4. Thus we can see if the data provides evidence for or against Trichelle’s claim. We could also divide the posterior odds by the prior odds to see how much our understanding evolve after observing the data.

  1. We can address the issue by making competing hypotheses. For instance, let π denote the proportion of the students at the large university who have heard of Bayesian statistics. Say if we wan to test whether or not a certain percent X1 of the students at the large university have heard of Bayesian statistics, this is two-sided hypothesis testing:

H0: π = X1

Ha: π ≠ X1

Based on the prior and observed data, we could approach it by reporting the credible interval (CI) for π, a range of posterior plausible π values. Next, according to our pre-defined proportion window around X1 which defines the range of values that are considered meaningfully different from X1, we see if the hypothesized value of π (here X1) is “substantially” outside the posterior credible interval, if so we have ample evidence in favor of Ha.

  1. It could be addressed using a hypothesis test. Let π denote the proportion of voters in the environmental justice advocate’s state who support a new regulation. We can frame our analysis as two competing hypotheses: the null hypothesis H0 contends that voters in the environmental justice advocate’s state who support a new regulation make up at most 60% of the total voters in the state whereas the alternative hypothesis Ha contends that this figure is above 60%. In mathematical notation:

H0: π ≤ 0.6

Ha: π > 0.6

We can then do the one-sided hypothesis test by calculating the prior and posterior probabilities of the null and alternative hypotheses before and after observing data. Next, we could divide prior or posterior probability of Ha by that of H0 to come up with the odds that indicate how much more/less likely for π to be above 0.6 than to be or below 0.6. Thus we can see if the data provides evidence for or against Ha. We could also divide the posterior odds by the prior odds to see how much our understanding evolve after observing certain data.

  1. It could be addressed using a hypothesis test. Let λ denote the number of times that Ptolemy uses a certain mode of argument per page of the Syntaxis Mathematica text. Based on Ptolemy’s other writings, the typical number of times Sarah thinks that Ptolemy uses a certain mode of argument per page of text is 3 times per page. Sarah can then observe and record the sample data drawn from the 13 volumes of Syntaxis Mathematica, how many times Ptolemy uses a certain mode of argument per page across the 90 pages. Next, she can frame her analysis as two competing hypotheses: the null hypothesis H0 contends that the number of times that Ptolemy uses a certain mode of argument per page of the Syntaxis Mathematica text is at most 2 whereas the alternative hypothesis Ha contends that this figure is above 2. In mathematical notation:

H0: λ ≤ 2

Ha: λ > 2

We can do the one-sided hypothesis test by calculating the prior and posterior probabilities of the null and alternative hypotheses before and after observing data. Next, we could divide prior or posterior probability of Ha by that of H0 to come up with the odds that indicates how much more/less likely for λ to be above 2 than to be or below 2. Thus we can see if the data provides evidence for or against Ha. We could also divide the posterior odds by the prior odds to see how much our understanding evolve after observing the data.

Exercise 8.4

  1. If the possible outcomes of an event are either E or E’, E’ is the complement E(c). The odds of event E versus event E’ are the ratio of their probabilities. After observing the data, the posterior probability (probability of a model given the observed data) updates the prior probability by taking the new information into account. So the posterior odds are given by the ratio of the posterior probabilities P(E|D)/P(E’|D), indicating how much more/less likely that E happens rather than E’ happens according to our posterior understanding after observing the data.

  2. If the possible outcomes of an event are either E or E’, E’ is the complement E(c). Prior probability (the credibility of the model before observing the current data) of event E or E’ represents what is originally believed before new evidence is introduced, then the prior odds of event E versus event E’ are the ratio of their prior probabilities P(E)/P(E’), indicating how much more/less likely that E happens rather than E’ happens according to our prior understanding before observing any data.

  3. The Bayes Factor is the ratio of the posterior odds to the prior odds. We might want to calculate it because by comparing the posterior odds to the prior odds, the Bayes Factor provides insight into how much our understanding about an event evolved upon observing the sample data. If BF = 1, the plausibility of E didn’t change in light of the observed data; if BF > 1, the plausibility of E increased in light of the observed data (the greater the Bayes Factor, the more convincing the evidence for E); if BF < 1, the plausibility of E decreased in light of the observed data.

Exercise 8.5

  1. Posterior predictive models first incorporate sampling variability in the data. Sample variability refers to how much an estimate varies between samples. The fact that a statistic will take on different values from sample to sample means that when we randomly draw a sample from the target population, the observed data will fluctuate depending upon which random sample we happen to get. We need to estimate sampling variability so we know the extent to which the measure of a sample approximate the true parameter.

Posterior predictive models also incorporate posterior variability in π. π can varies in a plausible range of value, namely anywhere in its posterior credible interval. Thus, when making predictions about the outcome of new data, we need to consider what outcomes we’d expect to see under each possible π while accounting for the fact that some π are more plausible than others.

  1. Suppose one wants to know the proportion of members of a particular church who are LGBT. Let π denote the proportion of members of the organization who are LGBT. The Beta(1,6) prior model for π reflects our own vague prior assumption that the church will have few LGBT members due to the church tradition, i.e., π most likely falls below 0.5. To learn more about π, we examine n = 50 members sampled from the church, and it turns out that 15 out of 50 members in the sample are LGBT. Therefore, our updated posterior model of π in light of the observed data is Beta(16,41). Suppose we now get our hands on data for 20 more members of the church. Based on the posterior understanding of π that we’ve developed, we can predict the number of people in the 20 members who are LGBT via posterior prediction.

  2. A posterior predictive model is conditional on both the data and the parameter since the goal of posterior prediction is to assess the fit between a model shaped by certain parameters and the observed data by answering the following question: Could the model we’ve assumed plausibly have produced the data we observed?

Exercise 8.6

# 0.025th & 0.975th quantiles of the Beta(4,5) posterior
qbeta(c(0.025, 0.975), 4, 5)
## [1] 0.1570128 0.7551368

The resulting 95% credible interval for π is (0.16, 0.76).

# 0.20th & 0.80th quantiles of the Beta(4,5) posterior
qbeta(c(0.20, 0.80), 4, 5)
## [1] 0.3032258 0.5836554

The resulting 50% credible interval for π is (0.30, 0.58).

# 0.025th & 0.975th quantiles of the Gamma(1,8) posterior
qgamma(c(0.025, 0.975), 1, 8)
## [1] 0.003164726 0.461109932

The resulting 95% credible interval for π is (0.003, 0.461).

Exercise 8.7

# 0.005th & 0.995th quantiles of the Gamma(1,5) posterior
qgamma(c(0.005, 0.995), 1, 5)
## [1] 0.001002508 1.059663473

The resulting 99% credible interval for π is (0.001, 1.060).

# 0.025th & 0.975th quantiles of the N(10,2^2) posterior
qnorm(c(0.025, 0.975), 10, 2^2)
## [1]  2.160144 17.839856

The resulting 95% credible interval for π is (2.16, 17.84).

# 0.10th & 0.90th quantiles of the N(-3,1^2) posterior
qnorm(c(0.10, 0.90), -3, 1^2)
## [1] -4.281552 -1.718448

The resulting 80% credible interval for π is (-4.282, -1.718).

Exercise 8.8

plot_gamma(1,5)

# Sketch plot
knitr::include_graphics("image/CI_1.png")

# Simulation plot
gp_model <- " data {

int<lower = 0> Y[2];

} parameters {

real<lower = 0> lambda; } model {

Y ~ poisson(lambda);

lambda ~ gamma(1, 3); }

"
gp_sim <- stan(model_code = gp_model, data = list(Y = c(0,0)), chains = 4, iter = 20000*2, seed = 6)
## 
## SAMPLING FOR MODEL 'fc1218228fa6ac97b474ffd91a440b81' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 6e-06 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.06 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:     1 / 40000 [  0%]  (Warmup)
## Chain 1: Iteration:  4000 / 40000 [ 10%]  (Warmup)
## Chain 1: Iteration:  8000 / 40000 [ 20%]  (Warmup)
## Chain 1: Iteration: 12000 / 40000 [ 30%]  (Warmup)
## Chain 1: Iteration: 16000 / 40000 [ 40%]  (Warmup)
## Chain 1: Iteration: 20000 / 40000 [ 50%]  (Warmup)
## Chain 1: Iteration: 20001 / 40000 [ 50%]  (Sampling)
## Chain 1: Iteration: 24000 / 40000 [ 60%]  (Sampling)
## Chain 1: Iteration: 28000 / 40000 [ 70%]  (Sampling)
## Chain 1: Iteration: 32000 / 40000 [ 80%]  (Sampling)
## Chain 1: Iteration: 36000 / 40000 [ 90%]  (Sampling)
## Chain 1: Iteration: 40000 / 40000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.070762 seconds (Warm-up)
## Chain 1:                0.074802 seconds (Sampling)
## Chain 1:                0.145564 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL 'fc1218228fa6ac97b474ffd91a440b81' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 2e-06 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.02 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:     1 / 40000 [  0%]  (Warmup)
## Chain 2: Iteration:  4000 / 40000 [ 10%]  (Warmup)
## Chain 2: Iteration:  8000 / 40000 [ 20%]  (Warmup)
## Chain 2: Iteration: 12000 / 40000 [ 30%]  (Warmup)
## Chain 2: Iteration: 16000 / 40000 [ 40%]  (Warmup)
## Chain 2: Iteration: 20000 / 40000 [ 50%]  (Warmup)
## Chain 2: Iteration: 20001 / 40000 [ 50%]  (Sampling)
## Chain 2: Iteration: 24000 / 40000 [ 60%]  (Sampling)
## Chain 2: Iteration: 28000 / 40000 [ 70%]  (Sampling)
## Chain 2: Iteration: 32000 / 40000 [ 80%]  (Sampling)
## Chain 2: Iteration: 36000 / 40000 [ 90%]  (Sampling)
## Chain 2: Iteration: 40000 / 40000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 0.071962 seconds (Warm-up)
## Chain 2:                0.072034 seconds (Sampling)
## Chain 2:                0.143996 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL 'fc1218228fa6ac97b474ffd91a440b81' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 1e-06 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.01 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:     1 / 40000 [  0%]  (Warmup)
## Chain 3: Iteration:  4000 / 40000 [ 10%]  (Warmup)
## Chain 3: Iteration:  8000 / 40000 [ 20%]  (Warmup)
## Chain 3: Iteration: 12000 / 40000 [ 30%]  (Warmup)
## Chain 3: Iteration: 16000 / 40000 [ 40%]  (Warmup)
## Chain 3: Iteration: 20000 / 40000 [ 50%]  (Warmup)
## Chain 3: Iteration: 20001 / 40000 [ 50%]  (Sampling)
## Chain 3: Iteration: 24000 / 40000 [ 60%]  (Sampling)
## Chain 3: Iteration: 28000 / 40000 [ 70%]  (Sampling)
## Chain 3: Iteration: 32000 / 40000 [ 80%]  (Sampling)
## Chain 3: Iteration: 36000 / 40000 [ 90%]  (Sampling)
## Chain 3: Iteration: 40000 / 40000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 0.072197 seconds (Warm-up)
## Chain 3:                0.081648 seconds (Sampling)
## Chain 3:                0.153845 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL 'fc1218228fa6ac97b474ffd91a440b81' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 2e-06 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.02 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: Iteration:     1 / 40000 [  0%]  (Warmup)
## Chain 4: Iteration:  4000 / 40000 [ 10%]  (Warmup)
## Chain 4: Iteration:  8000 / 40000 [ 20%]  (Warmup)
## Chain 4: Iteration: 12000 / 40000 [ 30%]  (Warmup)
## Chain 4: Iteration: 16000 / 40000 [ 40%]  (Warmup)
## Chain 4: Iteration: 20000 / 40000 [ 50%]  (Warmup)
## Chain 4: Iteration: 20001 / 40000 [ 50%]  (Sampling)
## Chain 4: Iteration: 24000 / 40000 [ 60%]  (Sampling)
## Chain 4: Iteration: 28000 / 40000 [ 70%]  (Sampling)
## Chain 4: Iteration: 32000 / 40000 [ 80%]  (Sampling)
## Chain 4: Iteration: 36000 / 40000 [ 90%]  (Sampling)
## Chain 4: Iteration: 40000 / 40000 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 0.072042 seconds (Warm-up)
## Chain 4:                0.089308 seconds (Sampling)
## Chain 4:                0.16135 seconds (Total)
## Chain 4:
## Warning: There were 3 divergent transitions after warmup. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## to find out why this is a problem and how to eliminate them.
## Warning: Examine the pairs() plot to diagnose sampling problems
# Density plot of the Markov chain values
mcmc_dens(gp_sim, pars = "lambda") + yaxis_text(TRUE) + ylab("density")

# Shade in the middle 95% interval
mcmc_areas(gp_sim, pars = "lambda", prob = 0.95)

Tried several simulations, all of which have slight discrepancies between the density plots and the real Gamma(1,5) plot.

# Sketch plot
knitr::include_graphics("image/CI_2.png")

  1. They are not the same. We can see the interval from part a reports the 95% of posterior values with the highest posterior densities, from the 0th to the 95th percentile whereas the interval from part b reports the range of the middle 95% of the posterior density, from the 2.5th to the 97.5th percentile.

For the right-skewed Gamma(1,5) with an extreme lopsidedness, the interval from part a is more appropriate since, in the 95% CI reported by the middle 95% approach in part b, values included in the upper end of the CI are less plausible than lower values of π below the 2.5th percentile that were left out of the CI while these more plausible values from the CI are included using the highest posterior density approach.

# Since normal distribution is symmetrical, the 95% highest posterior density credible interval for μ is the same area as the range of the middle 95% of the posterior density for μ, so we can use the “middle 95%” approach to construct the 95% highest posterior density credible interval for μ.

# 0.025th & 0.975th quantiles of the N(-13,2^2) posterior
qnorm(c(0.025, 0.975), -13, 2)
## [1] -16.919928  -9.080072

The resulting 95% credible interval for π is (-16.919928, -9.080072).

# 0.025th & 0.975th quantiles of the N(-13,2^2) posterior
qnorm(c(0.025, 0.975), -13, 2)
## [1] -16.919928  -9.080072

The resulting 95% credible interval for π is (-16.919928, -9.080072).

  1. The two intervals from parts d and e are the same. Since the normal model is symmetrical distribution, the 95% highest posterior density credible interval for μ is the same area as the range of the middle 95% of the posterior density for μ, namely, the range of the middle 95% of the posterior density, from the 2.5th to the 97.5th percentile, reported in part e is the 95% of posterior values with the highest posterior densities reported in part d.

Exercise 8.9

# Posterior probability that pi > 0.4
post_prob <- pbeta(0.4, 4, 3, lower.tail = FALSE) 
post_prob
## [1] 0.8208

The posterior probability for the alternative hypothesis is 0.8208.

# Since the posterior probability for the alternative hypothesis is 0.8208, the posterior probability that pi ≤ 0.4 is (1-0.8208).
# Posterior odds
post_odds <- post_prob/(1 - post_prob) 
post_odds
## [1] 4.580357

The posterior odds is 4.580357. That is, our posterior assessment is that π is nearly 5 times more likely to be above 0.4 than to be below 0.4. The posterior odds of π being above 0.4 were roughly 82 in 100.

# Prior probability that pi > 0.4
prior_prob <- pbeta(0.4, 1, 0.8, lower.tail = FALSE) 
prior_prob
## [1] 0.6645398
# The prior probability that pi ≤ 0.4 is (1-0.6645398)=0.3354602.
# Prior_odds
prior_odds <- prior_prob / (1 - prior_prob)
prior_odds
## [1] 1.98098

So the prior odds is 1.98098. That is, our prior assessment is that π is nearly 2 times more likely to be above 0.4 than to be below 0.4. The prior odds of π being above 0.4 were roughly 67 in 100.

# Bayes factor
BF <- post_odds / prior_odds
BF
## [1] 2.312168

So the Bayes Factor is 2.312167, which indicates that the plausibility of the alternative hypothesis that pi > 0.4 increased in light of the observed data.

  1. Our understanding about pi evolved upon observing the data: our prior assessment is that π is nearly 2 times more likely to be above 0.4 than to be below 0.4; our posterior assessment is that π is nearly 5 times more likely to be above 0.4 than to be below 0.4. So, for Beta(1,0.8) prior model and a Beta(4,3) posterior, the observed data provide evidence more supporting the alternative hypothesis that pi > 0.4 while negating the null hypothesis that pi ≤ 0.4 since the posterior odds are greater than its prior odds. We felt it was very likely that pi > 0.4, but we were somewhat unsure. We are now more confident that it is very likely that pi > 0.4 after observing the data.
plot_beta(1,0.8)

plot_beta(4,3)

plot_beta_binomial(1,0.8,3,5.2)
## Warning in dbinom(x = y_data, size = n, prob = x): NaNs produced
## Warning: Computation failed in `stat_function()`:
## non-finite function value
## Warning in dbinom(x = y_data, size = n, prob = x): NaNs produced
## Warning: Computation failed in `stat_function()`:
## non-finite function value

Exercise 8.10

# Posterior probability that μ < 5.2
post_prob_1 <- pnorm(5.2, 5, 3) 
post_prob_1
## [1] 0.5265765

The posterior probability for the alternative hypothesis is 0.5265765.

# Since the posterior probability for the alternative hypothesis is 0.5265765, the posterior probability that μ ≥ 5.2 is (1-0.5265765).
# Posterior odds
post_odds_1 <- post_prob_1/(1 - post_prob_1) 
post_odds_1 
## [1] 1.112274

The posterior odds is 1.112274. That is, our posterior assessment is that μ is about 1.112 times more likely to be below 5.2 than to be or above 5.2. The posterior odds of μ being below 5.2 were roughly 53 in 100.

# Prior probability that μ < 5.2
prior_prob_1 <- pnorm(5.2, 10, 10) 
prior_prob_1
## [1] 0.3156137
# The prior probability that μ ≥ 5.2 is (1-0.3156137)=0.6843863.
# Prior_odds
prior_odds_1 <- prior_prob_1 / (1 - prior_prob_1)
prior_odds_1
## [1] 0.4611631

So the prior odds is 0.4611631. That is, our prior assessment is that μ is nearly 0.5 times less likely to be below 5.2 than to be or above 5.2. The prior odds of μ being below 5.2 were roughly 32 in 100.

# Bayes factor
BF <- post_odds_1 / prior_odds_1
BF 
## [1] 2.411888

So the Bayes Factor is 2.411888, which indicates that the plausibility of the alternative hypothesis that μ < 5.2 increased in light of the observed data.

  1. Our understanding about μ evolved upon observing the data: our prior assessment is that μ is nearly 0.5 times less likely to be below 5.2 than to be or above 5.2; our posterior assessment is that μ is about 1.112 times more likely to be below 5.2 than to be or above 5.2. So, for N(10,10^2) prior model and a Beta(5,3^2) posterior, the observed data provide evidence more supporting the alternative hypothesis that μ < 5.2 while negating the null hypothesis that μ ≥ 5.2 since the posterior odds are greater than its prior odds. We felt it was unlikely that μ < 5.2, but we were somewhat unsure. We now think that it is a bit more likely that μ < 5.2 after observing the data, though we were still not sure, the plausibility that μ < 5.2 increases.
plot_normal(10,10)

plot_normal(5,3)

Exercise 8.14

  1. Beta-Binomial model is appropriate for this analysis since π denotes the proportion of U.S. adults that do not believe in climate change ranges from [0,1], Y denotes the number of n adults in the survey that don’t believe in climate change. Beta-Binomial model applies to settings like this that have a parameter of interest π that lives on [0,1] with any tuning of a Beta prior and any data Y which is the number of “successes” in n fixed, independent trials, each having probability of success π.

  2. My prior model for π is Beta(1,1) since this is a situation in which I have no idea about the parameter π. I think π is equally likely to be anywhere between 0 and 1.

plot_beta(1,1)

plot_beta(1,2)

My prior understanding differs from that of the authors in that while I think it’s equally plausible for π to take on any value between 0 and 1, the author thinks it is more likely that a small proportion of U.S. adults do not believe in climate change, but the author is somewhat unsure. The author thinks π is most likely around 0 but could reasonably range from 0 to 1.

# Load data
data("pulse_of_the_nation")

pulse_of_the_nation %>% 
  tabyl(climate_change) %>% 
  adorn_totals("row")
##                 climate_change    n percent
##                Not Real At All  150   0.150
##      Real and Caused by People  655   0.655
##  Real but not Caused by People  195   0.195
##                          Total 1000   1.000

The sample proportion of surveyed U.S adults with the opinion that climate change is not real at all is 0.150.

# Posterior model of π: Beta(151,852)
summarize_beta_binomial(1,2,150,1000)
##       model alpha beta      mean      mode          var         sd
## 1     prior     1    2 0.3333333 0.0000000 0.0555555556 0.23570226
## 2 posterior   151  852 0.1505484 0.1498501 0.0001273741 0.01128601
# 0.025th & 0.975th quantiles of the Beta(151,852) posterior
qbeta(c(0.025, 0.975), 151,852)
## [1] 0.1291000 0.1733164

The resulting 95% credible interval for π is (0.1291000, 0.1733164). It means the fraction of π values that fall into this region is 0.95. This range captures the more plausible values of π while eliminating the more extreme and unlikely scenarios. It indicates that there’s a 95% posterior probability that somewhere between about 12.9% and 17.3% of surveyed U.S adults have the opinion that climate change is not real at all.

Exercise 8.15

plot_beta(151,852)

Given the 95% credible interval for π is (0.129, 0.173), there’s a 95% posterior probability that somewhere between 12.9% and 17.3% of surveyed U.S adults have the opinion that climate change is not real at all. Thus we have ample evidence in favor of H a . The fact that the hypothesized range of π that π ≤ 0.1 is “substantially” outside the posterior credible interval while π > 0.1 can have intersections with the range of plausible π values makes me believe that the proportion of surveyed U.S adults who don’t believe in climate change is very unlikely to be or below 0.1. So I’m more in favor of Ha.

# Posterior probability that π > 0.1
post_prob_3 <- pbeta(0.1, 151,852, lower.tail = FALSE) 
post_prob_3
## [1] 0.9999997

The posterior probability for the alternative hypothesis is 0.9999997, which indicates that the result reveals strong evidence in favor of the researcher’s claim: there’s a roughly 99.9% posterior chance that the proportion of surveyed U.S adults who don’t believe in climate change is above 0.1.

# Posterior odds
post_odds_3 <- post_prob_3/(1 - post_prob_3) 
post_odds_3 
## [1] 3197444
# Prior probability that π > 0.1
prior_prob_3 <- pbeta(0.1, 1,2, lower.tail = FALSE)
prior_prob_3
## [1] 0.81
# Prior_odds
prior_odds_3 <- prior_prob_3 / (1 - prior_prob_3)
prior_odds_3
## [1] 4.263158
# Bayes factor
BF_3 <- post_odds_3 / prior_odds_3
BF_3 
## [1] 750017.6

So the Bayes Factor is 750017.6, which indicates that the plausibility of the alternative hypothesis that π > 0.1 increased a lot in light of the observed data. Bringing it all together, the posterior probability (0.9999997) and Bayes Factor (750017.6) establish fairly convincing evidence in favor of the claim that more than 10% of people believe in climate change.

  1. Our understanding about pi evolved a lot upon observing the data: our prior assessment is that π is about 4 times more likely to be above 0.1 than to be or below 0.1; our posterior assessment is that π is about 3197444 times more likely to be above 0.1 than to be or below 0.1. So, for Beta(1,2) prior model and a Beta(151,852) posterior, the observed data provide evidence further supporting the alternative hypothesis that pi > 0.1 while further negating the null hypothesis that pi ≤ 0.1 since the posterior odds are so much greater than its prior odds. We initially felt it was likely that pi > 0.1, but we were somewhat unsure. We are now very confident that it is very likely that pi > 0.1 and extremely unlikely that pi ≤ 0.1 after observing the data.
plot_beta_binomial(1,2,150,1000)

Exercise 8.16

# STEP 1: DEFINE the model
no_cc_model <- " 
data {
int<lower = 0, upper = 1000> Y;
} 
parameters {
real<lower = 0, upper = 1> pi; 
} 
model {
Y ~ binomial(1000, pi);
pi ~ beta(1, 2); }
"

# STEP 2: SIMULATE the posterior
no_cc_sim <- stan(model_code = no_cc_model, data = list(Y = 150),
chains = 4, iter = 10000, seed = 616)
## 
## SAMPLING FOR MODEL '7045169b2d94829e357e76f561ac09b4' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 5e-06 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.05 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 10000 [  0%]  (Warmup)
## Chain 1: Iteration: 1000 / 10000 [ 10%]  (Warmup)
## Chain 1: Iteration: 2000 / 10000 [ 20%]  (Warmup)
## Chain 1: Iteration: 3000 / 10000 [ 30%]  (Warmup)
## Chain 1: Iteration: 4000 / 10000 [ 40%]  (Warmup)
## Chain 1: Iteration: 5000 / 10000 [ 50%]  (Warmup)
## Chain 1: Iteration: 5001 / 10000 [ 50%]  (Sampling)
## Chain 1: Iteration: 6000 / 10000 [ 60%]  (Sampling)
## Chain 1: Iteration: 7000 / 10000 [ 70%]  (Sampling)
## Chain 1: Iteration: 8000 / 10000 [ 80%]  (Sampling)
## Chain 1: Iteration: 9000 / 10000 [ 90%]  (Sampling)
## Chain 1: Iteration: 10000 / 10000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.018348 seconds (Warm-up)
## Chain 1:                0.021554 seconds (Sampling)
## Chain 1:                0.039902 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL '7045169b2d94829e357e76f561ac09b4' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 4e-06 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.04 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 10000 [  0%]  (Warmup)
## Chain 2: Iteration: 1000 / 10000 [ 10%]  (Warmup)
## Chain 2: Iteration: 2000 / 10000 [ 20%]  (Warmup)
## Chain 2: Iteration: 3000 / 10000 [ 30%]  (Warmup)
## Chain 2: Iteration: 4000 / 10000 [ 40%]  (Warmup)
## Chain 2: Iteration: 5000 / 10000 [ 50%]  (Warmup)
## Chain 2: Iteration: 5001 / 10000 [ 50%]  (Sampling)
## Chain 2: Iteration: 6000 / 10000 [ 60%]  (Sampling)
## Chain 2: Iteration: 7000 / 10000 [ 70%]  (Sampling)
## Chain 2: Iteration: 8000 / 10000 [ 80%]  (Sampling)
## Chain 2: Iteration: 9000 / 10000 [ 90%]  (Sampling)
## Chain 2: Iteration: 10000 / 10000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 0.017978 seconds (Warm-up)
## Chain 2:                0.018138 seconds (Sampling)
## Chain 2:                0.036116 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL '7045169b2d94829e357e76f561ac09b4' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 4e-06 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.04 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 10000 [  0%]  (Warmup)
## Chain 3: Iteration: 1000 / 10000 [ 10%]  (Warmup)
## Chain 3: Iteration: 2000 / 10000 [ 20%]  (Warmup)
## Chain 3: Iteration: 3000 / 10000 [ 30%]  (Warmup)
## Chain 3: Iteration: 4000 / 10000 [ 40%]  (Warmup)
## Chain 3: Iteration: 5000 / 10000 [ 50%]  (Warmup)
## Chain 3: Iteration: 5001 / 10000 [ 50%]  (Sampling)
## Chain 3: Iteration: 6000 / 10000 [ 60%]  (Sampling)
## Chain 3: Iteration: 7000 / 10000 [ 70%]  (Sampling)
## Chain 3: Iteration: 8000 / 10000 [ 80%]  (Sampling)
## Chain 3: Iteration: 9000 / 10000 [ 90%]  (Sampling)
## Chain 3: Iteration: 10000 / 10000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 0.018212 seconds (Warm-up)
## Chain 3:                0.020059 seconds (Sampling)
## Chain 3:                0.038271 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL '7045169b2d94829e357e76f561ac09b4' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 2e-06 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.02 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: Iteration:    1 / 10000 [  0%]  (Warmup)
## Chain 4: Iteration: 1000 / 10000 [ 10%]  (Warmup)
## Chain 4: Iteration: 2000 / 10000 [ 20%]  (Warmup)
## Chain 4: Iteration: 3000 / 10000 [ 30%]  (Warmup)
## Chain 4: Iteration: 4000 / 10000 [ 40%]  (Warmup)
## Chain 4: Iteration: 5000 / 10000 [ 50%]  (Warmup)
## Chain 4: Iteration: 5001 / 10000 [ 50%]  (Sampling)
## Chain 4: Iteration: 6000 / 10000 [ 60%]  (Sampling)
## Chain 4: Iteration: 7000 / 10000 [ 70%]  (Sampling)
## Chain 4: Iteration: 8000 / 10000 [ 80%]  (Sampling)
## Chain 4: Iteration: 9000 / 10000 [ 90%]  (Sampling)
## Chain 4: Iteration: 10000 / 10000 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 0.018144 seconds (Warm-up)
## Chain 4:                0.020481 seconds (Sampling)
## Chain 4:                0.038625 seconds (Total)
## Chain 4:
# trace plots 
mcmc_trace(no_cc_sim, pars = "pi", size = 0.5) + xlab("iteration")

# density plots
mcmc_dens_overlay(no_cc_sim, pars = "pi")

# autocorrelation plot
mcmc_acf(no_cc_sim, pars = "pi")

The randomness in the trace plots and the agreement in the density plots of the four parallel chains suggest that the simulation is extremely stable. Further, the dependent chains are behaving “enough” like an independent sample. The autocorrelation plots for four chains drop off quickly and and are effectively 0 by lag 5, which indicates that the Markov chains are mixing quickly, i.e., quickly moving around the range of posterior plausible π values, and thus at least mimicking an independent sample. A good posterior approximation is produced to approximate the real posterior.

# Calculate the effective sample size ratio
neff_ratio(no_cc_sim, pars = c("pi"))
## [1] 0.3953379

Effective sample size ratio means the accuracy in using our 20,000 length Markov chain to approximate the posterior of π is roughly as great as if we had used only 39.5% as many independent values. The effective sample size ratio is satisfyingly high (above 0.1) - our 20,000 Markov chain values are as effective as around 7906 independent samples (0.3953379*20000).

# Calculate R-hat ratio
rhat(no_cc_sim, pars = "pi")
## [1] 1.000763

The R-hat metric calculates the ratio between variability in π across all four chains combined and the typical variability within any individual chain. The Rhat value of approximately 1 (<1.05) here reflect stability across the parallel chains.

The effective sample size ratio and R-hat value for the simulation also indicate that the Markov chains are mixing quickly and the simulation has stabilized to approximate the real posterior.

Exercise 8.17

# The actual Beta(151,852) posterior
plot_beta(151,852) +
lims(x = c(0, 0.25))

# MCMC posterior approximation
mcmc_dens(no_cc_sim, pars = "pi") +
lims(x = c(0,0.25))
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.

tidy(no_cc_sim, conf.int = TRUE, conf.level = 0.95)
## # A tibble: 1 × 5
##   term  estimate std.error conf.low conf.high
##   <chr>    <dbl>     <dbl>    <dbl>     <dbl>
## 1 pi       0.150    0.0113    0.129     0.173
mcmc_areas(no_cc_sim, pars = "pi", prob = 0.95)

The approximated (middle) 95% posterior credible interval for π is (0.129444,0.1732492).

# Store the 4 chains in 1 data frame
no_cc_chains_df <- as.data.frame(no_cc_sim, pars="lp__", include = FALSE)
dim(no_cc_chains_df)
## [1] 20000     1
# Tabulate pi values that are above 0.1
no_cc_chains_df %>%
mutate(exceeds = pi > 0.1) %>% 
  tabyl(exceeds)
##  exceeds     n percent
##     TRUE 20000       1

The approximated posterior probability that π>0.1 is 1.

  1. The approximation of the (middle) 95% posterior credible interval for π in part a (0.129444,0.1732492) is very close to the actual (middle) 95% posterior credible interval for π I calculated in Exercises 8.14 (0.1291000, 0.1733164), only the 2.5th percentile of the Markov chain values is about 0.000344 higher than the actual 2.5th percentile and the 97.5th percentile of the Markov chain values is about 0.0000672 lower than the actual 97.5th percentile.

The approximation of the posterior probability that π > 0.1 in part b (1.0) is also very close to the actual posterior probability that π > 0.1 I calculated in Exercises 8.15 (0.9999997), only about 0.0000003 higher.

Exercise 8.18

# Set the seed
set.seed(6)

# Predict a value of Y' for each pi value in the chain
no_cc_chains_df <- no_cc_chains_df %>% 
  mutate(y_predict = rbinom(length(pi), size = 100, prob = pi))

# Check it out
no_cc_chains_df %>% 
  head(3)
##          pi y_predict
## 1 0.1507204        16
## 2 0.1475618        20
## 3 0.1424637        12
# Plot the 20000 predictions
ggplot(no_cc_chains_df, aes(x = y_predict)) + 
  stat_count()

  1. The resulting collection of 20000 predictions closely approximates the true posterior predictive distribution. It’s most likely that 14 of the 100 more surveyed U.S adults don’t believe in climate change, though Y’ might reasonably range between 2 and 33.

# Tabulate Y' values that are or above 20
no_cc_chains_df %>%
mutate(y_predict_20 =  y_predict > 19) %>% 
  tabyl(y_predict_20)
##  y_predict_20     n percent
##         FALSE 17501 0.87505
##          TRUE  2499 0.12495

The approximated probability that at least 20 of the 100 people don’t believe in climate change is 0.12495.

Exercise 8.19

  1. Normal-Normal model is appropriate for this analysis since the flipper length (in mm) among the Adelie penguin species Y are measured on a continuous scale, they could be beyond 1 and are often symmetrically or normally distributed around some typical average length, here μ.

# Since for the prior Normal(μ,σ^2), E(Y) = Mode(Y) = μ and the prior understanding is that the average flipper length for all Adelie penguins is about 200mm, μ could be 200. 

# Since roughly 95% of Y values will be within 2 standard deviations of μ: μ ± 2σ, and the prior understanding is that the average flipper length for all Adelie penguins could be as low a 140mm or as high as 260mm, μ + 2σ ≈ 260 and μ - 2σ ≈ 140, so σ could be around 30.

# Check it out N(200, 30^2)
plot_normal(200, 30)

# Make minor modifications on the variability accordingly
plot_normal(200, 20)

So an appropriate prior model for μ could be N(200,20^2).

penguins_bayes %>% 
  group_by(species) %>% 
  summarize(average = mean(flipper_length_mm, na.rm = TRUE),
            sd = sd(flipper_length_mm, na.rm = TRUE),
            number = n())
## # A tibble: 3 × 4
##   species   average    sd number
##   <fct>       <dbl> <dbl>  <int>
## 1 Adelie       190.  6.54    152
## 2 Chinstrap    196.  7.13     68
## 3 Gentoo       217.  6.48    124

For the Adelie species, there are 152 data points and the sample mean flipper_length_mm is 189.9536, about 190.

summarize_normal_normal(200,20,6.54,190,152)
##       model    mean    mode         var         sd
## 1     prior 200.000 200.000 400.0000000 20.0000000
## 2 posterior 190.007 190.007   0.2811943  0.5302776

So the posterior is N(190,0.53^2)

# 0.025th & 0.975th quantiles of the N(190,0.53^2) posterior
qnorm(c(0.025, 0.975), 190, 0.53)
## [1] 188.9612 191.0388

The (middle) 95% posterior credible interval for μ is (188.9612,191.0388). It means the fraction of μ values that fall into this region is 0.95. This range captures the more plausible values of μ while eliminating the more extreme and unlikely scenarios. It indicates that there’s a 95% posterior probability that the typical flipper length (in mm) among the Adelie penguin species is somewhere between about 188.9612 and 191.0388.

Exercise 8.20

  1. I hypothesize that the average Adelie flipper length is somewhere between 200mm and 220mm. Therefore I frame my analysis as two competing hypotheses: the null hypothesis H0 contends that the average Adelie flipper length (in mm) is somewhere between 200 and 220 whereas the alternative hypothesis Ha (my claim) contends that this figure is between 200 and 220. In mathematical notation:

H0 : μ < 200 | μ > 220

Ha : 200 ≤ μ ≤ 220

  1. Given the 95% credible interval for μ is (188.9612,191.0388), there’s a 95% posterior probability that the typical flipper length (in mm) among the Adelie penguin species is somewhere between about 188.9612 and 191.0388. Thus we have ample evidence in favor of H0 . The fact that the hypothesized range of μ that 200 ≤ μ ≤ 220 is “substantially” outside the posterior credible interval while μ < 200 or μ > 220 can have intersections with the range of plausible μ values makes me believe that the typical flipper length (in mm) among the Adelie penguin species is very unlikely to be between 200 and 220. So I’m more in favor of H0.

# Posterior probability that 200 ≤ μ ≤ 220
post_prob_4 <- pnorm(220, mean=190, sd=0.53) - pnorm(200, mean=190, sd=0.53)
post_prob_4
## [1] 0

The posterior probability for the alternative hypothesis is 0, which reveals strong evidence against my claim: there’s 0 posterior chance that the typical flipper length (in mm) among the Adelie penguin species is between 200 and 220.

post_prob_4 / (1-post_prob_4)
## [1] 0
prior_prob_4 <- pnorm(220, mean=200, sd=20) - pnorm(200, mean=200, sd=20)
prior_prob_4
## [1] 0.3413447
prior_prob_4 / (1-prior_prob_4)
## [1] 0.5182449

My understanding about μ evolved upon observing the data: my prior assessment is that μ is most likely to be 200, but could reasonably lies between 140 and 260, which indicates that it is likely that 200 ≤ μ ≤ 220, but I am somewhat unsure (it is likely that μ < 200 or μ > 220 too). After observing the sample data, my posterior assessment becomes that μ is impossible to be between 200 and 220. So, for N(200,20^2) prior model and a N(190,0.53^2) posterior, the observed data provide evidence negating the alternative hypothesis while instead supporting the null hypothesis that μ < 200 | μ > 220 since the prior odds (0.5182449) are so much greater than its posterior odds (0). Given BF is 0 (posterior odds/prior odds=0/0.5182449), the plausibility of Ha decreased to 0 in light of the observed data.

Exercise 8.21

  1. Gamma-Poisson is the appropriate model for this analysis since λ, the typical number of loons observed by a birdwatcher across a 100-hour observation period a rate parameter can take on any positive value and each data point Y is a bird count collected in different outings measured on discontinuous scale technically with no upper limit and not limited by the number of outings n.

# Since for the prior Gamma(s, r), E(λ) = s/r, Var(λ)= s/r^2 and the prior understanding is that typical rate of loon sightings is 2 per 100 hours with a standard deviation of 1 per 100-hours, s could be 4, r could be 2.


# Check it out Gamma(4, 2)
plot_gamma(4,2)

So an appropriate prior model for λ is Gamma(4, 2).

loons %>% 
  view()
loons %>% 
  summarize(average_loon = mean(count_per_100, na.rm = TRUE),
            sd = sd(count_per_100, na.rm = TRUE),
            sum_loon = sum(count_per_100, na.rm = TRUE),
            number = n())
## # A tibble: 1 × 4
##   average_loon    sd sum_loon number
##          <dbl> <dbl>    <dbl>  <int>
## 1          1.5  1.58       27     18

We have 18 data points and the average loon count per 100 hours is 1.5.

summarize_gamma_poisson(4,2,27,18)
##       model shape rate mean mode    var        sd
## 1     prior     4    2 2.00  1.5 1.0000 1.0000000
## 2 posterior    31   20 1.55  1.5 0.0775 0.2783882

So the posterior is Gamma(31,20)

# 0.025th & 0.975th quantiles of the Gamma(31,20) posterior
qgamma(c(0.025, 0.975), 31,20)
## [1] 1.053150 2.141343

The (middle) 95% posterior credible interval for λ is (1.053150,2.141343). It means the fraction of λ values that fall into this region is 0.95. This range captures the more plausible values of λ while eliminating the more extreme and unlikely scenarios. It indicates that there’s a 95% posterior probability that the typical number of loons observed by a birdwatcher across a 100-hour observation period is somewhere between about 1.053150 and 2.141343.

Exercise 8.22

  1. I hypothesize that birdwatchers should anticipate a rate of less than 1 loon per observation period. Therefore I frame my analysis as two competing hypotheses: the null hypothesis H0 contends that that the typical rate of loons observed by a birdwatcher per 100-hour observation period is or above 1 whereas the alternative hypothesis Ha (my claim) contends that this rate is below 1. In mathematical notation:

H0 : λ ≥ 1

Ha : λ < 1

  1. Given the 95% credible interval for λ is (1.053150, 2.141343), there’s a 95% posterior probability that the typical rate of loons observed by a birdwatcher per 100-hour observation period is somewhere between about 1.053150 and 2.141343. Thus we have ample evidence in favor of H0. The fact that the hypothesized range of λ that λ < 1 is “substantially” outside the posterior credible interval while λ ≥ 1 can have intersections with the range of plausible λ values makes me believe that the typical rate of loons observed by a birdwatcher per 100-hour observation period is very unlikely to be less than 1. So I’m more in favor of H0.

# Posterior probability that λ < 1
post_prob_5 <- pgamma(1,31,20)
post_prob_5
## [1] 0.01347468

The posterior probability for the alternative hypothesis is 0.01347468, which reveals weak evidence supporting my claim: there’s only a 0.01347468 posterior chance that the typical rate of loons observed by a birdwatcher per 100-hour observation period is less than 1.

post_odds_5 <- post_prob_5 / (1-post_prob_5)
post_odds_5
## [1] 0.01365873
prior_prob_5 <- pgamma(1,4,2)
prior_prob_5
## [1] 0.1428765
prior_odds_5 <- prior_prob_5 / (1-prior_prob_5)
prior_odds_5
## [1] 0.1666931
BF_5 <- post_odds_5/prior_odds_5
BF_5
## [1] 0.08193939

My understanding about λ evolved upon observing the data: my prior assessment is that λ is most likely to be 2, but could varies with a standard deviation of 1, which indicates that it is likely that λ < 1, but I am somewhat unsure (it is likely that λ ≥ 1 too). After observing the sample data, my posterior assessment becomes that λ is very unlikely to be less than 1. So, for the Gamma(4,2) prior model and Gamma(31,20) posterior, the observed data provide evidence more negating the alternative hypothesis while more supporting the null hypothesis that λ ≥ 1 since the prior odds (0.1666931) are greater than its posterior odds (0.01365873). Given BF is 0.08193938, the plausibility of Ha decreased in light of the observed data.

Exercise 8.23

# DEFINE the model
loon_model <- " 
data {
  int<lower = 0> Y[18]; 
  }
parameters {
  real<lower = 0> lambda;
}
model {
  Y ~ poisson(lambda);
  lambda ~ gamma(4, 2); 
  }
"
# SIMULATE the posterior
loon_sim <- stan(model_code = loon_model, data = list(Y = c(0,5,0,0,0,2,0,2,1,1,0,4,2,0,3,3,1,3)), 
               chains = 4, iter = 5000*2, seed = 616)
## 
## SAMPLING FOR MODEL '3bf367408175d50d84223aeeceb0ef99' NOW (CHAIN 1).
## Chain 1: 
## Chain 1: Gradient evaluation took 7e-06 seconds
## Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 0.07 seconds.
## Chain 1: Adjust your expectations accordingly!
## Chain 1: 
## Chain 1: 
## Chain 1: Iteration:    1 / 10000 [  0%]  (Warmup)
## Chain 1: Iteration: 1000 / 10000 [ 10%]  (Warmup)
## Chain 1: Iteration: 2000 / 10000 [ 20%]  (Warmup)
## Chain 1: Iteration: 3000 / 10000 [ 30%]  (Warmup)
## Chain 1: Iteration: 4000 / 10000 [ 40%]  (Warmup)
## Chain 1: Iteration: 5000 / 10000 [ 50%]  (Warmup)
## Chain 1: Iteration: 5001 / 10000 [ 50%]  (Sampling)
## Chain 1: Iteration: 6000 / 10000 [ 60%]  (Sampling)
## Chain 1: Iteration: 7000 / 10000 [ 70%]  (Sampling)
## Chain 1: Iteration: 8000 / 10000 [ 80%]  (Sampling)
## Chain 1: Iteration: 9000 / 10000 [ 90%]  (Sampling)
## Chain 1: Iteration: 10000 / 10000 [100%]  (Sampling)
## Chain 1: 
## Chain 1:  Elapsed Time: 0.01837 seconds (Warm-up)
## Chain 1:                0.022397 seconds (Sampling)
## Chain 1:                0.040767 seconds (Total)
## Chain 1: 
## 
## SAMPLING FOR MODEL '3bf367408175d50d84223aeeceb0ef99' NOW (CHAIN 2).
## Chain 2: 
## Chain 2: Gradient evaluation took 2e-06 seconds
## Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.02 seconds.
## Chain 2: Adjust your expectations accordingly!
## Chain 2: 
## Chain 2: 
## Chain 2: Iteration:    1 / 10000 [  0%]  (Warmup)
## Chain 2: Iteration: 1000 / 10000 [ 10%]  (Warmup)
## Chain 2: Iteration: 2000 / 10000 [ 20%]  (Warmup)
## Chain 2: Iteration: 3000 / 10000 [ 30%]  (Warmup)
## Chain 2: Iteration: 4000 / 10000 [ 40%]  (Warmup)
## Chain 2: Iteration: 5000 / 10000 [ 50%]  (Warmup)
## Chain 2: Iteration: 5001 / 10000 [ 50%]  (Sampling)
## Chain 2: Iteration: 6000 / 10000 [ 60%]  (Sampling)
## Chain 2: Iteration: 7000 / 10000 [ 70%]  (Sampling)
## Chain 2: Iteration: 8000 / 10000 [ 80%]  (Sampling)
## Chain 2: Iteration: 9000 / 10000 [ 90%]  (Sampling)
## Chain 2: Iteration: 10000 / 10000 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 0.018688 seconds (Warm-up)
## Chain 2:                0.018851 seconds (Sampling)
## Chain 2:                0.037539 seconds (Total)
## Chain 2: 
## 
## SAMPLING FOR MODEL '3bf367408175d50d84223aeeceb0ef99' NOW (CHAIN 3).
## Chain 3: 
## Chain 3: Gradient evaluation took 3e-06 seconds
## Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.03 seconds.
## Chain 3: Adjust your expectations accordingly!
## Chain 3: 
## Chain 3: 
## Chain 3: Iteration:    1 / 10000 [  0%]  (Warmup)
## Chain 3: Iteration: 1000 / 10000 [ 10%]  (Warmup)
## Chain 3: Iteration: 2000 / 10000 [ 20%]  (Warmup)
## Chain 3: Iteration: 3000 / 10000 [ 30%]  (Warmup)
## Chain 3: Iteration: 4000 / 10000 [ 40%]  (Warmup)
## Chain 3: Iteration: 5000 / 10000 [ 50%]  (Warmup)
## Chain 3: Iteration: 5001 / 10000 [ 50%]  (Sampling)
## Chain 3: Iteration: 6000 / 10000 [ 60%]  (Sampling)
## Chain 3: Iteration: 7000 / 10000 [ 70%]  (Sampling)
## Chain 3: Iteration: 8000 / 10000 [ 80%]  (Sampling)
## Chain 3: Iteration: 9000 / 10000 [ 90%]  (Sampling)
## Chain 3: Iteration: 10000 / 10000 [100%]  (Sampling)
## Chain 3: 
## Chain 3:  Elapsed Time: 0.018396 seconds (Warm-up)
## Chain 3:                0.018835 seconds (Sampling)
## Chain 3:                0.037231 seconds (Total)
## Chain 3: 
## 
## SAMPLING FOR MODEL '3bf367408175d50d84223aeeceb0ef99' NOW (CHAIN 4).
## Chain 4: 
## Chain 4: Gradient evaluation took 3e-06 seconds
## Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.03 seconds.
## Chain 4: Adjust your expectations accordingly!
## Chain 4: 
## Chain 4: 
## Chain 4: Iteration:    1 / 10000 [  0%]  (Warmup)
## Chain 4: Iteration: 1000 / 10000 [ 10%]  (Warmup)
## Chain 4: Iteration: 2000 / 10000 [ 20%]  (Warmup)
## Chain 4: Iteration: 3000 / 10000 [ 30%]  (Warmup)
## Chain 4: Iteration: 4000 / 10000 [ 40%]  (Warmup)
## Chain 4: Iteration: 5000 / 10000 [ 50%]  (Warmup)
## Chain 4: Iteration: 5001 / 10000 [ 50%]  (Sampling)
## Chain 4: Iteration: 6000 / 10000 [ 60%]  (Sampling)
## Chain 4: Iteration: 7000 / 10000 [ 70%]  (Sampling)
## Chain 4: Iteration: 8000 / 10000 [ 80%]  (Sampling)
## Chain 4: Iteration: 9000 / 10000 [ 90%]  (Sampling)
## Chain 4: Iteration: 10000 / 10000 [100%]  (Sampling)
## Chain 4: 
## Chain 4:  Elapsed Time: 0.018124 seconds (Warm-up)
## Chain 4:                0.018461 seconds (Sampling)
## Chain 4:                0.036585 seconds (Total)
## Chain 4:
# trace plots 
mcmc_trace(loon_sim, pars = "lambda", size = 0.5) + xlab("iteration")

# density plots
mcmc_dens_overlay(loon_sim, pars = "lambda")

# autocorrelation plot
mcmc_acf(loon_sim, pars = "lambda")

# Calculate the effective sample size ratio
neff_ratio(loon_sim, pars = c("lambda"))
## [1] 0.3572974
# Calculate R-hat ratio
rhat(loon_sim, pars = "lambda")
## [1] 1.000068

There is no strong trend or flat lines in the trace plot. We see consistency across the four chains in the density plots and they exhibit similar features and produce nearly indistinguishable posterior approximations. The autocorrelations quickly drop off and are effectively 0 by lag 5, which further confirms that our Markov chains are mixing quickly, i.e., quickly moving around the range of posterior plausible lambda values, and thus at least mimicking the independent sample. Effective sample size ratio means the accuracy in using our 20,000 length Markov chain to approximate the posterior of lambda is roughly as great as if we had used only 35.7% as many independent values. The effective sample size ratio is satisfyingly high (above 0.1) - our 20,000 Markov chain values are as effective as around 7145 independent samples (0.3572974*20000). The R-hat metric calculates the ratio between variability in π across all four chains combined and the typical variability within any individual chain. The Rhat value of approximately 1 (<1.05) here reflect stability across the parallel chains. All this indicates that the Markov chains are mixing quickly and the simulation has stabilized.

# The actual Gamma(31,20) posterior
plot_beta(31,20) +
lims(x = c(0,3))

# MCMC posterior approximation
mcmc_dens(loon_sim, pars = "lambda") +
lims(x = c(0,3))
## Scale for 'x' is already present. Adding another scale for 'x', which will
## replace the existing scale.

tidy(loon_sim, conf.int = TRUE, conf.level = 0.95)
## # A tibble: 1 × 5
##   term   estimate std.error conf.low conf.high
##   <chr>     <dbl>     <dbl>    <dbl>     <dbl>
## 1 lambda     1.53     0.280     1.05      2.14
mcmc_areas(loon_sim, pars = "lambda", prob = 0.95)

The approximated (middle) 95% posterior credible interval for λ is (1.050636,2.142541).

# Store the 4 chains in 1 data frame
loon_chains_df <- as.data.frame(loon_sim, pars="lp__", include = FALSE)
dim(loon_chains_df)
## [1] 20000     1
# Tabulate lambda values that are below 1
loon_chains_df %>%
mutate(below = lambda < 1) %>% 
  tabyl(below)
##  below     n percent
##  FALSE 19718  0.9859
##   TRUE   282  0.0141

The approximated posterior probability that λ < 1 is 0.0141.

  1. The approximation of the (middle) 95% posterior credible interval for λ in part c (1.050636,2.142541) is very close to the actual (middle) 95% posterior credible interval for λ I calculated in Exercises 8.21 (1.053150, 2.141343), only the 2.5th percentile of the Markov chain values is about 0.002514 lower than the actual 2.5th percentile and the 97.5th percentile of the Markov chain values is about 0.001198 higher than the actual 97.5th percentile.

The approximation of the posterior probability that λ < 1 in part d (0.0141) is also very close to the actual posterior probability that λ < 1 I calculated in Exercises 8.22 (0.01347468), only about 0.00062532 higher.

Exercise 8.24

# Set the seed
set.seed(666)

# Predict a value of Y' for each lambda value in the chain
loon_chains_df <- loon_chains_df %>% 
  mutate(y_predict_1 = rpois(n=20000, lambda = lambda))

# Check it out
loon_chains_df %>% 
  head(3)
##     lambda y_predict_1
## 1 1.380448           2
## 2 1.397318           0
## 3 1.582657           5
# Plot the 20000 predictions
ggplot(loon_chains_df, aes(x = y_predict_1)) + 
  stat_count()

  1. The resulting collection of 20000 predictions closely approximates the true posterior predictive distribution. It’s most likely that a birdwatcher will spy 1 loon in their next observation period, though Y’ might reasonably range between 0 and 9.

# Tabulate Y' value that is 0
loon_chains_df %>%
mutate(y_predict_0 =  y_predict_1) %>% 
  tabyl(y_predict_0)
##  y_predict_0    n percent
##            0 4335 0.21675
##            1 6430 0.32150
##            2 5003 0.25015
##            3 2637 0.13185
##            4 1097 0.05485
##            5  376 0.01880
##            6   86 0.00430
##            7   29 0.00145
##            8    6 0.00030
##            9    1 0.00005

The approximated probability that the birdwatcher observes 0 loons in their next observation period is 0.21675.